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Abstract — We may want to keep sensitive information in 
a relational database hidden from a user or group thereof. 
We characterize sensitive data as the extensions of secrecy 
views. The database, before returning the answers to a query 
posed by a restricted user, is updated to make the secrecy 
views empty or a single tuple with null values. Then, a query 
about any of those views returns no meaningful information. 
Since the database is not supposed to be physically changed 
for this purpose, the updates are only virtual, and also 
minimal. Minimality makes sure that query answers, while 
being privacy preserving, are also maximally informative. 
The virtual updates are based on null values as used in the 
SQL standard. We provide the semantics of secrecy views, 
virtual updates, and secret answers to queries. The different 
instances resulting from the virtually updates are specified as 
the models of a logic program with stable model semantics, 
which becomes the basis for computation of the secret answers. 

Index Terms — Data privacy, views, query answering, null 
values, view updates, answer set programs, database repairs. 



I. Introduction 

Database management systems allow for massive storage 
of data, which can be efficiently accessed and manipulated. 
However, at the same time, the problems of data privacy are 
becoming increasingly important and difficult to handle. For 
example, for commercial or legal reasons, administrators of 
sensitive information may not want or be allowed to release 
certain portions of the data. It becomes crucial to address 
database privacy issues. 

In this scenario, certain users should have access to 
only certain portions of a database. Preferably, what a 
particular user (or class of them) is allowed or not allowed 
to access should be specified in a declarative manner. This 
specification should be used by the database engine when 
queries are processed and answered. We would expect the 
database to return answers that do not reveal anything that 
should be kept protected from a particular user. On the other 
side and at the same time, the database should return as 
informative answers as possible once the privacy conditions 
have been taken care of. 

Some recent papers approach data privacy and access 
control on the basis of authorization views 1271 . 1331 . 
View-based data privacy usually approaches the problem 
by specifying which views a user is allowed to access. 
For example, when the database receives a query from the 
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user, it checks if the query can be answered using those 
views alone. More precisely, if the query can be rewritten 
in terms of the views, for every possible instance 11271 . If no 
complete rewriting is possible, the query is rejected. In ll33l 
the problem about the existence of a conditional rewriting 
is investigated, i.e. relative to an instance at hand. 

Our approach to the data protection problem is based 
on specifications of what users are not allowed to access 
through query answers, which is quite natural. Data owners 
usually have a more clear picture of the data that are 
sensitive rather than about the data that can be publicly 
released. Dealing with our problem as "the complement" 
of the problem formulated in terms of authorization views 
is not natural, and not necessarily easy, since complements 
of database views would be involved l20l . l2D . 

According to our approach, the information to be pro- 
tected is declared as a secrecy view, or a collection of 
them. Their extensions have to be kept secret. Each user 
or class of them may have associated a set of secrecy 
views. When a user poses a query to the database, the 
system virtually updates some of the attribute values on 
the basis of the secrecy views associated to that user. In 
this work, we consider updates that modify attribute values 
through null values, which are commonly used to represent 
missing or unknown values in incomplete databases. As a 
consequence, in each of the resulting updated instances, 
the extension of each of the secrecy views either becomes 
empty or contains a single tuple showing only null values. 
Either way, we say that the secrecy view becomes null. 
Then, the original query is posed to the resulting class of 
updated instances. This amounts to: (a) Posing the query to 
each instance in the class, (b) Answering it as usual from 
each of them, (c) Collecting the answers that are shared by 
all the instances in the class. In this way, the system will 
return answers to the query that do not reveal the secret 
data. The next example illustrates the gist of our approach. 

Example 1. Consider the following relational database D: 



Marks 


studentID 


courselD 


mark 




001 


01 


56 




001 


02 


90 




002 


02 


70 



The secrecy view V s defined below specifies that a student 
with her course mark must be kept secret when the mark 
is less than 60: 
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V a (sid, cid, mark) Marks(sid, cid, mark), mark < 600 
The view extension on the given instance is V S (D) = 
{(001,01,56)}, which is not null. Now, a user subject 
to this secrecy view wants to obtain the students' marks, 
posing the following query: 

Q(sid, cid, mark) 4— Marks(sid, cid, mark). (1) 

Through this query the user can obtain the first record 
Mark (001, 01, 56), which is sensitive information. A way 
to solve this problem consists in virtually updating the 
base relation according to the definition of the secrecy 
view, making its extension null. In this way, the secret 
information, i.e. the extension of the secrecy view, cannot 
be revealed to the user. Here, in order to protect the tuple 
Mark (001, 01, 56), the new instance D' below is obtained 
by virtually updating the original instance, changing the 
attribute value 56 into NULL. 



Marks 


studentID 


courselD 


mark 




001 


01 


NULL 




001 


02 


90 




002 


02 


70 



Now, by posing the query about the secrecy view, i.e. 

Qi(sid, cid, mark) <— Marks (sid, cid, mark), 

mark < 60, 

to D', the user gets an empty answer, i.e. now V S (D') = 0. 
This is because -in SQL databases- the comparison of NULL 
with any other value is not evaluated as true. 

Now, query (Q]i will get from D' the first tuple with NULL 
instead of 56, which can only be -misleadingly, expectedly 
and intendedly- interpreted by the user as an unknown or 
missing value for that student in the instance at hand D 
(not D' , which is fully hidden to the user). ■ 

Notice that, among other elements (cf. end of Section 
|IV| ), there are two that are crucial for this approach to work: 
(a) The given database may contain null values and if it 
has them or not is not known to the user, and (b) The 
semantics of null values, including the logical operations 
with them. In this second regard, we can say for the moment 
and in intuitive terms, that we will base our work on the 
SQL semantics of nulls, or, more precisely, on a logical 
reconstruction of this semantics (cf. Sections [lI-Al and [H-BI ). 

Hiding sensitive information is one of the concerns. 
Another one is about still providing as much informa- 
tion as possible to the user. In consequence, the virtual 
updates have to be minimal in some sense, while still 
doing their job of protecting data. In the previous exam- 
ple, we might consider virtually deleting the whole tuple 
Marks (001, 05, 56) to protect secret information, but we 
may lose some useful information, like the student ID and 
the course ID. Furthermore, the user should not be able 
to guess the protected information by combing information 
obtained from different queries. 

As illustrated above, null values will be used to virtually 
update the database instance. Null values and incomplete 

'We use Datalog notation for view definitions, and sometimes also for 
queries. 



databases have received the attention of the database com- 
munity E3, J29l, IH1, J23), Ql, and may have several 
possible interpretations, e.g. as a replacement for a real 
value that is non-existent, missing, unknown, inapplicable, 
etc. Several formal semantics have been proposed for them. 
Furthermore, it is possible to consider different, coexisting 
null values. In this work, we will use a single null value, 
denoted as above and in the rest of this paper, by null. 
Furthermore, we will treat null as the NULL in SQL 
relational databases. 

We want our approach to be applicable to, and imple- 
mentable on, DBMSs that conform to the SQL Standard, 
and are used in database practice. We concentrate on that 
scenario and SQL nulls, leaving for possible future work 
the necessary modifications for our approach to work with 
other kinds of null values. Since the SQL standard does 
not provide a precise, formal semantics for NULL, we 
define and adopt here a formal, logical reconstruction of 
conjunctive query answering under SQL nulls (cf. Section 
III-BI ). In this direction, we introduce unary predicates 
IsNull and IsNotNull in logical formulas that are true only 
when the argument is, resp. is not, the constant NULL. This 
treatment of null values was first outlined in J9], but here 
we make it precise. It captures the logics and the semantics 
of the SQL NULL that are relevant for our workJl Including 
this aspect of nulls in our work is necessary to provide the 
basic scientific foundations for our approach to privacy. 

In this paper, we consider only conjunctive secrecy views 
and conjunctive queries. The semantics of null-based virtual 
updates for data privacy that we provide is model-theoretic, 
in sense that the possible admissible instances after the 
update, the so-called secrecy instances, are defined and 
characterized. This definition captures the requirement that, 
on a secrecy instance, the extensions of the secrecy views 
contain only a tuple with null values or become empty. 
Furthermore, the secrecy instances do not depart from the 
original instance by more than necessary to enforce secrecy. 

Next, the semantics of secret answers to a query is 
introduced. Those answers are invariant under the class of 
secrecy instances. More precisely, a ground tuple t to a first 
order query Q(x) is a secret answer from instance D if it 
is an answer to Q(x) in every possible secrecy instance 
for D. Of course, explicitly computing and materializing 
all the secrecy instances to secretely answer a query is too 
costly. Ways around this naive approach have to be found. 

Actually, we show that the class of secrecy instances, 
for a given instance D and set of secrecy views V s , can 
be captured in terms of a disjunctive logic program with 
stable model semantics fl31 . fl6l . More precisely, there is 
a one-to-one correspondence between the secrecy instances 
and the stable models of the program. As a consequence, 
the logic programs can be used to: (a) Compactly specify 
(axiomatize) the class of secrecy instances; and (b) Com- 
pute secret answers to queries by running the program on 
top of the original instance. 

2 The main issue in (9) was integrity constraint satisfaction in the 
presence of nulls, for database repair and consistent query answering (3). 
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Our work has some similarities with that on database 
repairs and consistent query answering (CQA) (3), 10. In 
that case, the problem is about restoring consistency of a 
database wrt to a set of integrity constrains by means of 
minimal updates. The alternative consistent instances that 
emerge in this way are called repairs. They can be used to 
characterize the consistent data in an inconsistent database 
as the one that is invariant under the class of repairs. It is 
possible to specify the repairs of a database by means of 
disjunctive logic programs with stable model semantics (cf. 
[5 1 for references on CQA). 

Summarizing, in this paper we make the following 
contributions: (a) We introduce secrecy views to specify 
what to hide from a given user, (b) We introduce the 
virtual secrecy instances that are obtained by minimally 
changing attribute values by nulls, to make the secrecy 
view extensions null, (c) We introduce the secret answers 
as those that are certain for the class of secrecy instances. 
Those are the answers returned to the user, (d) We establish 
that this approach works in the sense that the queries 
about the secrecy view contents always return meaningless 
answers; and furthermore, the user cannot reconstruct the 
original instance via secret answers to different queries, 
(e) We provide a precise logical characterization of query 
answering in databases with null values a la SQL. (f) We 
specify by means of logic programs the secrecy instances of 
a database, which allows for skeptical reasoning, and then, 
certain query answering, directly from the specification, 
(g) We establish sme connections between secret query 
answering and CQA in databases. 

The structure of the rest of this paper is as follows. 
In Section [TT] we introduce basic notation and definitions, 
including the semantics of conjunctive query answering 
in databases with nulls. In Section [HI] we introduce the 
secrecy instances and investigate the properties of secrecy. 
Section ITVl presents the notion of secret answer to a query. 
Section [V] presents secrecy logic programs. Section |VT] in- 
vestigates the connection to database repairs and consistent 
query answering. Section IVIII discusses related work. In 
Section lVHII we draw conclusions, and point to future work. 

II. Preliminaries 

Consider a relational schema S = (14,1Z,B), where U is 
the possibly infinite database domain, with null £ U, 1Z 
is a finite set of database predicates, and B is a finite set 
of built-in predicates, say B = {=, ^, >, <}. For an n-ary 
predicate R G 1Z, R[i] denotes the ith position or attribute 
of R, with 1 < i < n. The schema determines a language 
L(E) of first-order (FO) predicate logic, with predicates 
in 1Z U B and constants in U. A relational instance D for 
schema E is a finite set of ground atoms of the form R(d), 
with R G 1Z, and a a tuple of constants from U (TJ. 

A query is a formula Q(x) of L(E), with n free variables 
x. D |= Q[c] denotes that instance D makes Q true with 
the free variables taking values as in c G U n . In this 
case, c is an answer to the query. Q(D) denotes the set 
of answers to query Q from D. We will concentrate on 



conjunctive queries, that are L(E)-formulas consisting of 
a possibly empty prefix of existential quantifiers followed 
by a conjunction of (database or built-in) atoms. 

Example 2. Consider the following database instance D±: 



R 


A 


B 


S 


B 


C 




a 


b 




b 


f 




c 


d 




d 


9 




e 


null 




null 


j 



For the conjunctive query Qi (x, z) : 3y(R(x, y) A S(y, z)), 
it holds, e.g. D\ h Qi[aJ}. Actually, Q^) = {(a,f), 
(c, g), (e, j)}. Notice that here, and for the moment, we are 
treating null as any other constant in the domain. ■ 

Data will be protected via a fixed set V s of secrecy views 
V s . They are associated to a particular user or class of them. 

Definition 1. A secrecy view V s is defined by a Datalog 
rule of the form 

V s (x) 4- Ri(xi), . . . ,R n (x n ), ip, (2) 

with Ri G 1Z, x C (J i Xi and x% is a tuple of variables^ 
Formula <p is a conjunction of built-in atoms containing 
terms, i.e. domain constants or variables. ■ 

We can see that a secrecy view is defined by a conjunctive 
query with built-in predicates written in £(E). The con- 
junctive query associated to the view in (O is: 

Q v >(x): 3y(R 1 (x 1 ) A ■ ■ ■ A R n (x n ) A<p), (3) 

with y = (\Jxi) \ x. Cory'(E) denotes the class of 
conjunctive queries of L(E), and V S (D) the extension of 
view V s computed on instance D for E. By definition, 

Example 3. (example |2] cont.) For the given instance 
Di, consider the secrecy view defined by V s (x) <— 
R(x,y), S(y, z). Here, the data protected by the view are 
those that belongs to its extension, namely V s (Di) = 
{(a), (c), (e)}. Sometimes, to emphasize the view predicate 
involved, we write instead V S (D\) — {V s (a), V s {c), V s {ej}. 
The corresponding conjunctive query is Q^(x) : 
3y3z(R(x,y)AS(y,z)). U 

Finally, an integrity constraint (IC) is a sentence ip of 
-L(E). D \= ijj denotes that instance D satisfies ip. For a 
fixed set I of ICs, we say that D is consistent when D \= I, 
i.e. when D satisfies each element of X. 

For both of the notions of query answer and IC satisfac- 
tion above we are using the classic concept of satisfaction 
of predicate logic, denoted with |=. According to it, the 
constant null is treated as any other constant of the 
database domain. We will use this notion at some places. 
However, in order to capture the special role of null among 
those constants, as in SQL databases, we will introduce next 
a different notion, denoted with \= N . In Example under 
the new semantics, and due to the participation of null in 

3 We will frequently use Datalog notation for view definitions and 
queries. When there is no possible confusion, we treat sequences of 
variables as set of variables. I.e. xi — x n as {xi, . . . ,x n }. 
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join, the tuple (e,j) will not be an answer anymore, i.e. 
D\ \/= N Qi[e,j]. The two notions, |= and 1=^., will coexist 
and also be related (cf. Section III-Bb . 

A. Null value semantics: The gist 

In fl2l . Codd proposed a three- valued logic with truth 
values true, false, and unknown for relational databases with 
NULL. When a NULL is involved in a comparison operation, 
the result is unknown. This logic has been adopted by the 
SQL standard, and partially implemented in most common 
commercial DBMSs (with some variations). As a result, 
the semantics of NULL in both the SQL standard and the 
commercial DBMSs is not quite clear; in particular, for IC 
satisfaction in the presence of NULL. 

The semantics for IC satisfaction with NULL introduced 
in (9), iflOl presents a FO semantics for nulls in SQL 
databases. It is a reconstruction in classical logic of the 
treatment of NULL in SQL DBs. More precisely, this 
semantics captures the notion of satisfaction of ICs, and 
also of query answering for a broad class of queries in 
relational databases. In the rest of this section, we motivate 
and sketch some of the elements of the notion of query 
answer that we will use in the rest of this work. The details 
can be found in Section III-BI In the following, we assume 
that there is a single constant, null, to represent a null value. 

A tuple c of elements of U is an answer to query Q(x), 
denoted D \=n Q(c), if the formula (that represents) 
Q is classically true when the quantifiers on its relevant 
variables (attributes) run over (U x {null}); and those on 
of the non-relevant variables run over U. The free relevant 
variables cannot take the value null either. For a precise 
definition see Section IlLBl (and also (9], ifTUll ). 

Example 4. Consider the instance D 2 and query below: 



R 


A 


B 


C 




1 


1 


1 




2 


null 


null 




null 


3 


3 



and 


query 


S 


B 




null 




1 




3 



Q 2 (x): 3y3z(R(x,y,z)AS(y)Ay>2). 



(4) 



A variable v (quantified or not) in a conjunctive query is 
relevant if it appears (non-trivially) twice in the formula 
after the quantifier prefix J9]- Occurrences of the form 
v = null and v ^ null do not count though. In query 
(0J, the only relevant quantified variable is y, because it 
participates in a join and a built-in in the quantifier-free 
matrix of So, there are two reasons for y to be relevant. 
The only free variable is x, which is not relevant. As 
for query answers, the only candidate values for x are: 
null, 2, 1. In this case, null is a candidate value because x 
is a non-relevant variable. 

First, x = null is an answer to the query, because the 
formula 3y3z(R(x, y, z) A S{y) A y > 2) is true in D 2 , 
with a non-null witness value for y and a witness value for 
z that combined make the (non-quantified) formula true. 
Namely, y = 3, z — 3. So, it holds D 2 \=n Q 2 [null]. 

Next, x = 2 is not an answer. For this value of x, because 
the candidate value for y, namely null that accompanies 2 



in P, makes the formula (R(x, y, z) A S(y) A y > 2) false. 
Even if it were true, this value for y would not be allowed. 

Finally, x = 1 is not an answer, because the only 
candidate value for y, namely 1, makes the formula false. 
In consequence, null is the only answer. ■ 

This notion of query answer coincides with the classic 
FO semantics for queries and databases without null values 
lED, IflOl . The next example with SQL queries and NULL 
provides additional intuition and motivation for the formal 
semantics of Section H1-BI Notice the use in logical queries 
of the new unary predicates IsNull and IsNotNull that we 
also formally introduce in Section Ill-BI 

Example 5. Consider the schema S = {R(A, B)} and the 
instance in the table below. In it NULL is the SQL null. If 
this instance is stored in an SQL database, we can observe 
the behavior of the following queries when they are directly 
translated into SQL and run on an SQL DB: 



R 


A 


B 




a 


b 




a 


c 




d 


NULL 




d 


e 




u 


u 




V 


NULL 




V 


r 




NULL 


NULL 



s 


B 


c 




b 


h 




NULL 


s 




1 


m 



(a) Qi (x, y) : R{x, y) A y = null 

SQL: Select * from R 
where B = NULL; 

Result: No tuple 



(b) Q[(x,y): R(x,y) A IsNull{y) 
SQL: Now uses IS NULL 

Result: (d, NULL), (v, NULL), (NULL, NULL) 

(c) Q 2 (x,y) : R(x,y) Ay ^ null 

SQL: Select * from R where B <> NULL; 
Result: No tuple 

(d) Q' 2 (x,y) : R(x,y) A IsNotNull(y) 
SQL: Now uses IS NOT NULL 
Answer: The five expected tuples 

(e) Qs(x,y) ■ R{x,y) Ax = y 

SQL: Select * from R where A = B; 
Result: (u,u) 

(f) Qi(x,y) : R{x,y) Ax ^ y 

SQL: Select * from R where A <> B; 
Result: Four tuples: (a,b), (a,c), (d,e), (v,r) 

(g) Qb{x, y, x, z) : R(x, y) A R(x, z) Ay^z 
SQL: Select * from R rl, R r2 where 

rl.A = r2.A and rl.B <> r2.B; 
Result: (a, b, a, c), (a, c, a,b) 

(h) Qe(x, y, z, t) : R(x, y) A S(z, t) Ay = z 
SQL: Select * from R rl, S si 

where rl.B = sl.B; 
Result: (a,b,b,h) 

(i) SQL: Select * from R rl join S si 

on rl.B = sl.B; 

Result^ (a,b,b,h) 
(j) Q 7 (x, y, z, t) : R(x, y) A S(z, t)Ay^z 

4 The same result is obtained from DBMSs that do not require an 
explicitly equality together with the join. 
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SQL: Select Rl.A, Rl.B, Sl.B, Sl.C 

from R Rl, S SI where Rl.B <> Sl.B'; 
Result: (a, c, b,h), (d,e,b, h), (u,u, b,h), (v,r, b,h), 
(a,b,l,m), (a,c,l,m), (d,e,l,m), (u,u,l,m), (v,r,l,m) ■ 

B. Semantics of query answers with nulls 

Here we introduce the semantics of FO conjunctive query 
answering in relational databases with null values[| More 
precisely, in SQL relational databases with a single null 
value, null, that is handled like the SQL NULL. The 
SQL queries are first reconstructed as queries in the FO 
language L(E nu11 ) associated to S nu11 = (U,K,B nu11 ), 
with B nu11 = B U {IsNull(-),IsNotNull(-)}. The last 
two are new unary built-in predicates that correspond to 
the SQL predicates IS NULL and IS NOT NULL, used to 
check null values. Their intended semantics is as follows 
(cf. Definition 0J: IsNull(null) is true, but IsNull{c) is 
false for any other constant c in the database domain. And, 
for any constant d <EU, IsNotNull(d) is true iff IsNull(d) 
is false. 

Introducing these predicates is necessary, because, as 
shown in Example [5] in the presence of NULL, SQL 
treats IS NULL and IS NOT NULL differently from = and 
j^, resp. For example, the queries Q(x) : 3y(R(x, y) A 
IsNull(y)) and Q'(x) : 3y(R(x,y) A y = null) are both 
conjunctive queries of L(E nu11 ), but in SQL relational 
databases, they have different semantics. 

In Example |5J each query Q is defined by the formula 
ijj on the right-hand side. Below, we will identify the query 
with its defining FO formula. Furthermore, we exclude from 
the SQL-like conjunctive queries those like (a) and (c) in 
Example 

Definition 2. (a) The class Conj sql (S nu11 ) contains all the 
conjunctive queries in L(S nu11 ) of the form 

Q(x) : 3y(Ai(xi) A ■ ■ ■ A A n (x n )), (5) 

where y C [J i x it x = ([J i £j) \ y, and the Ai are atoms 
containing any of the predicates in 7^U£> null plus terms, i.e. 
variables or constants in U. Furthermore, those atoms are 
never of the form t = null, null = t, t ^ null, null ^ t, 
with t a term, null or not. 

(b) With Conj(E nu11 ) we denote the class of all conjunctive 
queries of the form ((5), but without the restrictions on 
(in)equality atoms imposed on Cory sql (£ nu11 ). ■ 

The idea here is to force conjunctive queries a la SQL, 
i.e. those in Cory sql (£ nu11 ), that explicitly mention the 
null value in (in)equalities, to use the built-ins InNull or 
IsNotNull. Notice that the class Cory'(£ nu11 ) includes both 

Cory sql (£ nu11 ) and Conj'(S). 

Definition 3. Consider a query in Cory (£ nu11 ) of the form 
Q(x) : 3yip(x, y), with 3y a possibly empty prefix of 
existential quantifiers, and ip is a quantifier-free conjunction 
of atoms. A variable v is relevant for Q IflOl if it occurs at 

5 This semantics can be extended to a broader class of queries and also to 
integrity constraint satisfaction. It builds upon a similar and more general 
semantics first introduced in 0, 1101 . 



least twice in ip, without considering the atoms IsNull(v), 
IsNotNull{v), v null, or null 8 v, with 6 e B. V R (Q) 
denotes the set of relevant variables for Q. ■ 

For example, for the query Q(x) : 3y(P(x, y, z) A Q(y) A 
IsNull(y)), V R (Q(x)) = {y}, because y is used twice in 
the subformula P(x, y, z) A Q(y). 

As usual in FO logic, we consider assignments from the 
set, Var, of variables to the underlying database domain hi 
(that contains constant null), i.e. s : Var — > U. Such an 
assignment can be extended to terms, as s. It maps every 
variable x to s(x), and every element c of U to c. For an 
assignment s, a variable y and a constant c, s| denotes 
the assignment that coincides with s everywhere, possibly 
except on y, that takes the value c. Given a formula ip, 
ip[s] denotes the formula obtained from ip by replacing its 
free variables by their values according to s. 

Now, given a formula (query) x an d a variable assign- 
ment function s, we verify if instance D satisfies \[ s ] by 
assuming that the quantifiers on relevant variables range 
over (U \ {null}), and those on non-relevant variables 
range over U. More precisely, we define, by induction on \, 
when D satisfies \ with assignment s, denoted D \= N xl s ]- 

Definition 4. Let x be a query in Conj(£ nu11 ), and s 
an assignment. The pair D, s satisfies \ under the null- 
semantics, denoted D \= N x[ s ]> exactly in the following 
cases: (below t, t\, . . . are terms; and x, x\, X2 variables) 

1. (a) D \= N IsNull(t)[s], with s[t) = null, (b) D ^ N 
IsNotNull(t)[s], with s(t) ^ null. 

2. D hv (*i < *2)H> with s(ti) ^ null ^ s(t 2 ), and 
s(ti) < s(t 2 ) (similarly for >)@ 

3. (a) D \= N (x = c)[s], with s(x) = c G {U \ {null}). 
(or symmetrically)^ 

(b) D \= N (xi = £2)[s], with s(xi) = s(x2) ^ null. 

(c) D \= N (c = c)[s], with c6(W\ {null}). 

4. (a) D [= N (x ^ c)[s], with null / s(x) / c 6 (W \ 
{null}), (or symmetrically). 

(b) D \= N (ci ^ c 2 )[s], with ci ^= c 2 , and c\,c 2 £ {U \ 
{null}). 

5. D hv R(h,.. .,t„)[s], with R G K, and R(s(h), 
s(t n )) G D. 

6. D \= N (a A (3)[s], with a, (5 quantifier-free, s(y) ^ null 
for every y G V R {a A /3), and D \= N a[s] and D \= N /3[s]. 

7. D \= N (By a)[s] when: (a) if y G V R (a), there is c in 
(U \ {null}) with D \= N a[sf ]; or (b) if y g V R (a), there 
is c in U with D \= N a[s-]. ■ 

This semantics can be applied to conjunctive queries in 
Co7y sql (£ nu11 ). The notion of relevant attribute and this 
semantics of query satisfaction can be both extended to 
more complex formulas. In particular, they can be applied 
also to the satisfaction of integrity constraints under SQL 
null values IflOl, ||9l. 

Definition 5. IflOl Let Q(x) : 3yip(x,y) be a conjunctive 
query in Cory (£ nu11 ), with x — x\, . . . , x n . 

6 Of course, when there is an order relation on U. 
7 Here we use the symbols = and ^ both at the object and the meta 
levels, but there should not be a confusion since valuations are involved. 



6 



(a) A tuple (ci, . . . , c n ) £ W" is an answer f mm D under 
the null query answering semantics to Q, in short, an N- 
answer, denoted D ft N Q[ci, . . . , c n ], iff there exists an 
assignment s such that s(xi) = a, for i — 1, ...,n; and 

D hv 

(b) Q N (D) denotes the set of iV-answers to Q from 
instance D. Similarly, V (D) denotes a view extension ac- 
cording to the iV-answer semantics: V N (D) = (Q V Y(D). 

(c) If Q is a sentence (boolean query), the iV-answer is yes 
iff D ft N Q, and no, otherwise. ■ 

Notice that D ft N (3ytp)[s] in (a) above requires, according 
to Definition [4] that the variables in the existential prefix 
3y that are relevant do not take the value null. The free 
variables Xi in Q(x) may take the value null only when 
they are not relevant in the query. Example ^illustrates this 
definition. In it, since the free variable x is not relevant, 
Q 2 {D 2 ) = {{null}}. Similarly, in Example |2 it holds: 
Q?(D 1 ) = {(a,f), (c,g)}CQ 1 {D 1 ). 

Actually, it is easy to prove that, for queries in 
Conj(S nu11 ), it holds in general: Q N (D) C Q(D). Fur- 
thermore, the iV-query answering semantics coincides with 
classical FO query answering semantics in databases with- 
out null values iflOl . |9l . More precisely, if null £ U (and 
then it does not appear in D or Q either): D ft N Q[i\ iff 
D ft Q[t\- 

Furthermore, every conjunctive query in Conj (S nu11 ) 
can be syntactically transformed into a new FO query for 
which the evaluation can be done by treating null as any 
other constant 1101 . ||9l . (A similar transformation will be 
found in Proposition Q] below.) 

More precisely, a conjunctive query Q(x) € 
Conj (S nu11 ), i.e. of the form (O, can be rewritten 
into a classic conjunctive query, as follows: 

Q™(«) : 3y(Ai(sOA- • -AA n (x n ) A /\ v ft null). (6) 

veV R (Q) 

It holds: D ft N Q[c] iff D ft Q rw [c\. Here, on 
the right-hand side, we have classic FO satisfaction, and 
null is treated as an ordinary constant in the domain. This 
transformation ensures that relevant variables range over 
(U\{null}). Query Q rw {x) belongs to Conj(E nu11 ), and it 
may contain atoms of the form IsNull{t) or IsNotNull(t). 
However, replacing them by t = null or t ft null, resp., 
leads to a query in Cory (£) that has the same answers as 
© (under the same classic semantics). 

Example 6. (example [4] continued) Query Q in can be 
rewritten as 

Q r 2 w : 3y3z(P(x, y, z) A Q(y) A y > 2 A y ^ null). 

We had D \£ N Q 2 [l]. Now also D ft 3y3z(P(l, y, z) A 
Q(y) A y > 2 A y ft null) under classic query evaluation, 
with null treated as an ordinary constant. Similarly, D ft 
Q™[2] due to the new conjunct y = null. Finally, D \= 
Q r 2 w [null] because D ft (P{null, 3, 3)AQ(3)A3 > 2A3 ft 
null). Since null is treated as any other constant, we can 
compare it with 3. By the unique names assumption, it 
holds null ft 3. ■ 



Although our framework provides a precise semantics for 
conjunctive queries in Cory'(E) or Conj (E nu11 ), in both 
cases possibly containing (in)equalities involving null, a 
usual conjunctive query in SQL should be first translated 
into a conjunctive query Q in Conj sql (T, nu11 ) if we want 
to retain its intended semantics. After that Q™ can be 
computed. 

III. Secrecy Instances 

In this work we will make use of null to protect secret 
information. The basic idea that we develop in this and the 
next sections is that the extensions of the secrecy views, 
obtained as query answers, should contain only the tuple 
with nulls or become empty. In this case we will say that 
the view is null. 

Definition 6. A query Q(x) is null on instance D if 
Q N (D) C {(null, null)} (with the tuple inside with 
the same length as x). A view V{x) is null on D if the 
query defining it is null on D. ■ 
Example 7. (example [4] continued) Consider the secrecy 
view V s (x) <— R(x,y,z), S(y), y > 2. Its corresponding 
FO query Q Vs (x) in the one in (0), namely: 

Q 2 (x): 3y3z(R(x,y,z) AS(y) Ay>2). 

Under the semantics of secrecy in the presence of null, we 
expect the view to be null. This requires the values for 
attribute A associated with variable x in Q 2 to be null, 
or the values in B associated with variable y in Q 2 to be 
null, or the negation of the comparison to be true. These 
three cases correspond to the three assignments of Example 
[4] Thus, the view extension is V S (D 2 ) = {(null)}, which 
shows that the view is null on D2. I 

In this example we are in an ideal situation, in the sense 
that we did not have to change the instance to obtain 
a "secret answer". However, this may be an exceptional 
situation, and we will have to virtually "distort" the given 
instance by replacing -as few as possible- non-null attribute 
values by null. More generally, since it does not necessarily 
holds that each secrecy becomes null on an instance D 
at hand, the view extensions will be obtained from an 
alternative, possibly virtual, version D' of D that does 
make each of those views null. In this sense, D' will be an 
admissible instance (cf. Definition [7] below). At the same 
time, we want D' to stay as close as possible to D (cf. 
Definition [TT] below). Since there may be more that one 
such instance D 1 , we query all of them simultaneously, and 
return the certain answers |fl8l (cf. Definition Q~2] below). 
Each of the query and view evaluations is done according 
to the notion of TV-answer introduced in Section Ill-BI 

First, we define the instances that make the secrecy views 
empty or null. 

Definition 7. An instance D for schema £ is admissible for 
a set V s of secrecy views of the form (|2} if under the N- 
answer semantics (cf. Definition |5j, each V S (D) is empty 
or in all its tuples only null appears. Admiss(V s ) denotes 
the set of admissible instances. ■ 
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As Example [7] shows, D2 is admissible for the the given 
view. It also shows that there are some attributes that are 
particularly relevant for the view to be null, A and B in 
that case. In the following, we make precise this notion 
of secrecy-relevant attribute (cf. Definition [HJd) below). 
Before we used (plain) "relevance" associated to variables 
for query answering under nulls. Not surprisingly, the new 
notion is based on the previous one. This will allow us to 
provide an alternative and more operational characterization 
of secrecy instances (cf. Proposition Q] below). 

Definition 8. Consider a view V s defined as in (|2j. 

(a) For R G 1Z in the body of (O and a term t (i.e. a variable 
or constant), pos^Vg^t) denotes the set of positions in R 
where t appears in the body of V^'s definition. 

(b) The set of combination attributes for V s is: 

Ciy s ) = {R[i\ I for a relevant variable v, i S pos^VsjV)}. 

(c) The set of secrecy attributes for V s is: S(V S ) — {R[i] \ 
for an x in V s (x) in d2j, i G pos^VLv)}. 

(d) The set of s-relevant attribute^ for a secrecy view 
Vs are those (associated to positions) in the set A(Vs) ~ 
C(V S )US(V S ). u 

Combination attributes for a secrecy view V s are those 
involved in joins or built-in predicates (other than built-ins 
with explicit null). Secrecy attributes are those appearing 
in the head of V s 's definition, and accordingly, collect the 
query answers, which are expected to be secret. Hence, 
"secrecy attributes". They correspond to the free variables 
in the associated query Q Vs . 

Example 8. (example [7] continued) Consider again the 
secrecy view V s {x) •(— R(x,y,z), S(y), y > 2. Here 
£(^<i) = {-^[2], <S[1]}> because y is the only relevant 
variable; and S(V S ) = {R[l}}, because x is the only 
free variable. In consequence, A{V S ) = {R[l}, S[1],R[2}}. 
Attribute C, i.e. R[3], is not s-relevant. Actually, its value 
is not relevant to obtain the view extension. ■ 

The following proposition provides a characterization of 
admissible instance for a set of secrecy of views in terms 
of classic FO satisfaction (cf. ||24l Proposition 1]). In it we 
use the notation D \= 7 for the classic notion of satisfaction 
by an instance D of FO formula 7, where null is treated 
as any other constant. 

Proposition 1. Let V s be a set of secrecy views, each of 
whose elements V s is of the form (O, and has an expression 
Q Vs {x) : 3z7(/\™ =1 Ri(xi) A tp) as a conjunctive query. For 
an instance D, D G Admiss(V s ) iff for each V s G V s , 
D \= Null-V s , where Null-V s is the following sentence 
associated to Q Vs : 

n 

W(/\Ri(xi) — ► \J v=null V (7) 

i=i v e \jv-xi n C(y s ) 

A u = null V -tip). I 

u g ur*i n s(v B ) 

8 For distinction from the notion of relevant attribute/variable used in 
Sections UTaI and HTbI 



In the theorem, V denotes the universal closure of the 
formula that follows it; and v G (U™^ nC(l^)) indicates 
that variable v appears in some of the atoms Ri(xi) and in 
a combination attribute, etc. 

Sentence Null-V s in (0 originates in the FO rewriting 
{Q v ') rw as in © of the query Q v " associated to V s , and 
the requirement that the latter becomes null on D. 

Example 9. (example[8]continued) According to the above 
definition, in order to check whether the database instance 
£>2 is admissible, the following must hold: 

D2 \= Va;VyVz(i?(x, y, z) A S{y) — 5- x — null V 

y = null V y < 2). 

When checking sentence on D2, null is treated as any 
other constant. Notice that the values for the non-s-relevant 
attributes do not matter. 

For x = 1,2/ = 1, the antecedent of the implication is 
satisfied. For these values, the consequent is also satisfied, 
because y = 1 < 2. For x = 2,y = null, the consequent 
is satisfied since y is null. For x = null, y = 3, the 
antecedent is satisfied. For these values, the consequent is 
also satisfied, because null = null is true. So, D2 \=n 
Q v ", and instance D2 is admissible. ■ 

The next step consists in selecting from the admissible 
instances those that are close to the database we are 
protecting. This requires introducing a notion of distance or 
an order relationship between instances for a same schema. 
This would allow us to talk about minimality of change. 
Since, in order to enforce privacy on an instance D, we will 
virtually change attribute values by null, the comparison of 
instances has to take this kind of changes and the presence 
of null in tuples into account. Intuitively, a secrecy instance 
for D will be admissible and also minimally differ from D. 

Definition 9. (a) The binary relation c on the database 
domain U, is defined as follows: c C d iff c = null and 
d null. Its reflexive closure is C. 

(b) For ti = (ci, . . . , c„) and t 2 = (di, . . . , d n ) in U n j 
t\ C t 2 iff Ci rz di for each i G {1, . . . , n}. Also, t\ rz t 2 
iff ii [Z i 2 and h ^ t 2 . ■ 

This partial order relationship ii C i 2 indicates that i\ 
is less or equally informative than ?2- For example, tuple 
(a, null) provides less information than tuple (a, b). Then, 
(a, null) rz (a, b) holds. 

In order to capture the fact that we are just modifying 
attribute values, but not inserting or deleting tuples, we 
will assume (sometimes implicitly) that database tuples 
have tuple identifiers. More precisely, each predicate has an 
additional, first, attribute ID, which is a key for the relation, 
and whose values are taken in N and not subject to changes. 
In consequence, tuples in an instance D will be of the form 
R(k,t), with k G N, and t G W\ and R G TI is, implicitly, 
of arity n + 1. Below, we will consider only instances D' 
that are correlated to D, i.e. there is a surjective function 
K from D to D' , such that n(R(k, t)) = R{k, F), for some 
P. This mapping respects the predicate name and the tuple 



identifier. We say that D' is Z?-correlated (via k). In the 
rest of this section, D is a fixed instance, the one under 
privacy protection. We will usually omit tuple identifiers. 

Definition 10. (a) For database tuples Ri(ki,ii), 
R 2 {k 2 ,t 2 ): Ri{kiM) E R 2 (k 2 ,t 2 ) iff Ri = Ri, h = k 2 , 

and t\ C t 2 . 

(b) For instances Di,D2'. D\ C D 2 iff for every tuple 
Rx(kx,ti) <E Di, there is a tuple R 2 (k 2 ,i 2 ) with ^2(^,^2) 
C Pi (Mi)- 

(c) For /^-correlated instances £)i, Z^: -Di <n A iff: i- 
Di,D 2 E A and ii. D 2 ^ = D 1 . As usual, Di <£> L> 2 iff 
£>i <Z5 A, but not D 2 <d Di- ■ 

Notice that the condition (c)i. for the partial order <n 
forces Di and D 2 to be obtained from D by updating 
attribute values by null. Condition (c)ii. inverts the partial 
order C between tuples (and between instances). The reason 
is that we want secrecy instances to be minimal wrt the set 
of changes of attributes values by nulls (as customary for 
database repairs [5]). Informally, when D\ <d D 2 , Di 
is obtained from D, in comparison with D 2 , via "less" 
replacements of values by nulls, and then is close to D. 

Definition 11. An instance D s is a secrecy instance for D 
wrt a set V s of secrecy views iff: (a) D s e Admiss{V 8 ), and 
(b) D s is <£>-minimal in the class of £>-correlated database 
instances that satisfy (a). (I.e. there is no instance D 1 in that 
class with D' <d D s .) Sec(D,V s ) denotes the set of all 
the secrecy instances for D wrt V s . ■ 

Notice that a secrecy instance nullifies all the secrecy views, 
is obtained from D by changing attribute values by null, 
and the set of changes is minimal wrt set inclusion^ 

Example 10. Consider the instance D = {P(l, 2), P(2, 1)} 
for schema TZ = {P(A,B), R(B,C)}. With tuple iden- 
tifiers (underlined), it takes the form D = {P(l, 1,2), 
P(l, 2, 1)}. Consider also the secrecy view: 

V s (x,z)^P(x,y),R(y,z), y < 30 
D itself is not admissible (it does not nullify the secrecy 
view), and then it is not a secrecy instance either. Now, 
consider the following alternative updated instances Df. 



Di 
D 2 
D 3 
Lh 



|P(1, null, 2), R(l, 2, null)} 
{P(I,1, null), #(1,2,1)} 
{P(1,1,2),R(1, null, 1)} 
{P(l,l,null),R(l, null,!)} 



For example, for D\ the set of changes can be identified 
with the set of changed positions: U\ = {P[l], R[2]} (ID 
has position 0). The Di are all admissible, that is (cf. (f7]i): 

Di h VxVyVz{P(x,y)AR(y,z) — ► 

(y = null V (x = null A z = null) V y > 3). 

D\, D 2 , and D3 are the only three secrecy instances, 
i.e. they are <£>-minimal: The sets of changes U\, U 2 = 

9 As opposed to minimizing the cardinality of that set. Cf. for a 
discussion of different forms of "repairs" of databases. 

10 It would be easy to consider tuple ids in queries and view definition, 
but they do not contribute to the final result and will only complicate the 
notation. So, we skip tuple ids whenever possible. 



{P[2]}, and U3 = {R[l]} are all incomparable under set 
inclusion. D4 is not minimal, because U4 = {P[2], i?[l]} ^ 
U3, which is also reflected in the fact that P (1,1, null) 
□ P(l, 1, 2); and then, D 3 < D D A . ■ 



IV. Privacy Preserving Query Answers 

Now we want to define and compute the secret answers to 
queries from a given database D that is subject to privacy 
constraints, as represented by the nullification of the secrecy 
views. They will be defined on the basis of the class of 
secrecy instances for D. This class will be queried instead 
of directly querying D. In this sense, we may consider 
the class of secrecy instances as representing a logical 
database, given through its models. In such a case, the 
intended answers are those that are true of all the instances 
in the class, and become the so-called certain answers (18). 

Definition 12. Let Q(x) G Cory (£ nu11 ). A tuple c of 
constants in U is a secret answer to Q from D wrt to 
a set of secrecy views V s iff c £ Q N (D S ) for each 
D s € Sec(D,V s ). SA(Q,D,V S ) denotes the set of all 
secret answers. ■ 

Example 11. (example [TOl continued). Consider the query 

Q(x,z) : 3y(P(x,y) A R(y,z) A y < 3). According 
to Definition gj it holds: Q N (D{) = {{null, null}}, 
Q N (D 2 ) = 0, and Q N (D 3 ) = 0. These answers can also 
be obtained by first rewriting Q, as in (O, into the query 
Q rw (x,z) : 3y(P(x,y) A R(y,z) A y < 3 A y ^ null), 
which can be evaluated on each of the secrecy instances 
treating null as any other constant. 

We obtain SA(Q,D,{V S }) = Q N (D 1 ) n Q N (D 2 ) n 
Q N (D 3 ) = 0. This is as expected, because in this example, 
Q is Q Vs , the query associated to the secrecy view. ■ 

The idea behind answering queries from the secrecy 
instances (Sis) for D is that the answers are still close 
to those we would have obtained from D (because Sis are 
maximally close to D). Furthermore, since all the secrecy 
views become null on the Sis, the answers returned to any 
query, not necessarily to a secrecy view computation, will 
take this property into account. In the query answering 
part we are using a skeptical or cautious semantics, that 
sanctions as true what is simultaneously true in a whole 
class of models, or instances in our case (the Sis). Now 
we analyze to what extent this approach does protect the 
sensitive data. A restricted user may try to pose several 
queries to obtain sensitive information. 

Example 12. Consider instance D = {P(l,2), P(3,4), 
P(2,l), P(3,3)} for schema TZ = {P(A, B), R(B, C)}, 
and the secrecy view V s (x,z) <— P(x,y),R(y,z). In this 
case, V S N (D) = {{1, 1)}. D has the following Sis: 



Di 
D 3 



{P(null, 2), P(3, 4), R(2, null),R(3, 3)} 
{P(l, null), P(3, 4), R(2, 1), R(3, 3)} 
{P(l, 2), P(3, 4), R(null, 1), P(3, 3)} 



The user may pose the queries Qi(x,y) : P(x,y) and 
Q 2 (x,y) : R(x,y), trying to reconstruct D. It holds 
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g^(£>x) = {(mdl,2), (3,4)}, Q»(D 2 ) = {(l,null), 
(3,4)}, Q^(D 3 ) = {(1,2), (3,4)}. Then, SA(Q 1 , D, {V s }) 
= {(3, 4)}. Now, g%(Dt) = {(2, null), (3,3)}, Q*(D a ) = 
{(2,1), (3,3)}, Q%(D 3 ) = {(null,l), (3,3)}. Then, 

SA(Q 2 ,D,{V S }) = {(3,3}}. 

By combining the secret answers to Qi and Q2, it is 
not possible to obtain V S N (D). For the user who poses the 
queries Q\ and Q2, the relations look as follows: 



p 


A 


B 


i? 


S 


C 




3 


4 




3 


3 



Now, we establish in general the impossibility of ob- 
taining the contents of the secrecy views through the use 
of secret answers to atomic queries (as in the previous 
example). Open atomic queries are the "broader" queries 
we may ask; other queries are obtained from them by 
conjunctive combinations. 

Definition 13. Let V s be a set of secrecy views V s . 
The secrecy answer instance for V s from D is Dys = 
{R(c) I R G K and ce SA(R(x),D, V s )}. ■ 

Here, we are building a database instance by collecting the 
secret answers (SAs) to all the atomic queries of the form 
Q(x) : R(x), with R G 1Z. This instance has the same 
schema as D. 

Example 13. (example Q~2] continued) Consider the se- 
crecy view V s (x,z) P(x,y), R(y,z). It holds: 



D 



{Vs} 



{F(3,4)} U {i?(3,3)} = {P(3,4),i?(3,3)}. 
Notice that V^(D m ) = = SA(Q Vs ,D, {V s }) = 
C\ 3 i=i(Q Vs f(Di) = {(null, null)} n D 0. ■ 

Proposition 2. For every V s of the form (f2]i in V s , 
SA(Q v >,D,V s ) = V s (Dv,). M 

This proposition tells us that by combining SAs to 
queries, trying to reconstruct the original instance, we 
cannot obtain more information that the one provided by 
the SAs (cf. ||24"1 Proposition 2] for a proof). 

The original database D may contain null values, and 
users have to count on that. A restricted user will receive 
as query answers the SAs, which are defined and computed 
through null values. This user could obtain nulls from a 
query, and hopefully he will not know if they were already 
in D or were (virtually) introduced for privacy purposes. 
This is fine and accomplishes our goals. However, as long 
as the user does not have other kind of information. 

Example 14. Consider the instance D = {P(l, 1)}, and 
the secrecy view V s (x) «— P(x, y), x = 1. D has only one 
secrecy instance D s : 



p 


A 


B 




null 


1 



For the query Q(x) : 3y(P(x,y) A x = 1) associated to 
the secrecy view, the secrecy answer to Q{x) on D is 0. 
Now, the secrecy answer to Q'(x) : 3yP(x,y) is {(null)}. 
A user who receives this answer will not know if the null 
value was introduced to protect data. 

However, if the user knows from somewhere else that 
there is an SQL's NOT NULL constraint or a key constraint 



on the first attribute, and that it is satisfied by D, then he 
will know that the received null was not originally in D. 
Furthermore, that it is replacing a non-null value. If he 
also knows that there is exactly one tuple in the relation 
(a COUNT query), and also the secrecy view definition, he 
will infer that (1) G V 6 N (D). ■ 

In summary, for our approach to work, we rely on the 
following assumptions: 

(a) The user interacts via conjunctive query answering 
with a possibly incomplete database, meaning that the 
latter may contain null values, and this is something 
the former is aware of, and can count on (as with 
databases used in common practice). In this way, if 
a query returns answers with null values, the user 
will not know if they were originally in the database 
or were introduced for protection at query answering 
time. 

(b) The queries request data, as opposed to schema ele- 
ments, like integrity constraints and view definitions. 
Knowing the ICs (and about their satisfaction) in 
combination with query answers could easily expose 
the data protection policy. The most clear example is 
the one of a NOT NULL SQL constraint, when we see 
nulls where there should not be any. 

(c) In particular, the user does not know the secrecy view 
definitions. Knowing them would basically reveal the 
data that is being protected and how. 

These assumptions are realistic and make sense in many 
scenarios, for example, when the database is being accessed 
through the web, without direct interaction with the DBMS 
via complex SQL queries, or through an ontology that offers 
a limited interaction layer. After all, protecting data may 
require additional measures, like withholding from certain 
users certain information that is, most likely, not crucial for 
many applications. From these assumptions and Proposition 
|2j we can conclude that the user cannot obtain information 
about the secrecy views through a combination of SAs 
to conjunctive queries. Therefore, there is not leakage of 
sensitive information. 

V. Secrecy Instances and Logic Programs 

The updates leading to the secrecy instances (Sis) should 
not physically change the database. Also, different users 
may be restricted by different secrecy views. Rather, the 
possibly several Sis have to be virtual, and used mainly 
as an auxiliary notion for the secret answer semantics. We 
expect be able to avoid computing all the Sis, materializing 
them, and then cautiously querying the class they form. We 
would rather stick to the original instance, and use it as it 
is to obtain the secret answers. 

One way to approach this problem is via query rewriting. 
Ideally, a query Q posed to D and expecting secret answers 
should be rewritten into another query Q' . This new query 
would be posed to D, and the usual answers returned by D 
to Q! should be the secret answers to Q. We would like Q' 
to be still a simple query, that can be easily evaluated. For 
example, if Q' is FO, it can be evaluated in polynomial 
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time in data. However, this possibility is restricted by 
the intrinsic complexity of the problem of computing or 
deciding secret answers, which is likely to be higher than 
polynomial time in data (cf. Section IVB . In consequence, 
Q! may not even a FO query, let alone conjunctive. 

An alternative approach is to specify the Sis in a compact 
manner, by means of a logical theory, and do reasoning 
from that theory, which is in line with skeptical query 
answering. This will not decrease a possibly high intrinsic 
complexity, but can be much more efficient than computing 
all the secrecy instances and querying them in turns. Wrt 
the kind of logical specification needed, we can see that 
secret query answering (SQA) is a non-monotonic process. 

Example 15. Consider D = {P(a)}, the secrecy view 
V(x) <r- P(x), R(x), and the query Q : Ans(x) <— P(x). 
Here, V(D) = 0, and then, D itself is its only SI. 
Therefore, SA(Q,D,{V}) = {(a)}. 

Let us update D to D x = {P(a),R(a)}. Now, V{D X ) 
= {(a)}. The Sis for D 1 are: D[ = {P {null) , R{a)\ 
and D'l = {P(a),R(null)}. It holds, Q(D[) = {(null)} 
and Q(D'{) = {(a)}. Then, SA(Q,D U {V}) = 0. The 
previous secret answer is lost. ■ 

The non-monotonicity of SQA requires a non-monotonic 
formalism to logically specify the Sis of a given instance. 
Actually, they can be specified as the stable models of a 
disjunctive logic program, a so-called secrecy program. 

Secrecy programs use annotation constants with the 
intended, informal semantics shown in the table below. 
More precisely, for each database predicate R G 1Z, we 
introduce a copy of it with an extra, final attribute (or 
argument) that contains an annotation constant. So, a tuple 
of the form R(t) would become an annotated atom of the 
form R(t, a)li[| The annotation constants are used to keep 
track of virtual updates, i.e. of old and new tuples: 



Annotation 


Atom 


The tuple R(a) ... 


u 


R{a',u) 


is being updated 


bu 


R(a, bu) 


has been updated 


t 


R(a,t) 


is new or old 


s 


R(a, s) 


stays in the secrecy instance 



In R(a, bu), annotation bu means that the atom R(a) 
has already been updated, and u should appear in the 
new, updated atom, say R(a',u). For example, consider 
a tuple R(a, b) € D. A new tuple R(a, null) is obtained 
by updating b into null. Therefore, R(a, b, bu) denotes the 
old atom before updating, while P(a, null, u) denotes the 
new atom after the update. 

The logic program uses these annotations to go through 
different steps, until its stable models are computed. Finally, 
the atoms needed to build an SI are read off by restricting 
a model of the program to atoms with the annotation s. 
As expected, the official semantics of the annotations is 
captured through the logic program; the table above is just 
for motivation. In Section IV-AI we provide the general form 
of H(D, V s ), the secrecy logic program that specifies the 

"We should use a new predicate, e.g. R', but to keep the notation 
simple, we will reuse the predicate. We also omit tuple ids. 



Sis for an instance D subject to set of secrecy views V s . 
The following example illustrates the main ideas and issues. 

Example 16. (example [10] continued) Consider 1Z = 

{P(A,B), R(B,C)}, D = {P(l,2), R(2,l)} and the 
secrecy view V s (x, z) P(x, y), R(y, z), y < 3. 

The secrecy instance program H(D, {V s }) is as follows: 

1. P(l,2). R(2,l). (initial database) 

2. P(null, y, u) V P(x, null, u) V R(null, z, u) 

4— P(x, y, t), R(y, z, t), y < 3, y ^ null, aux(x, z). 

R(y, null, u) V P(x, null, u) V R(null, z, u) 
<— P(x,y,t), R(y, z,i),y < 3, y ^ null, aux(x, z). 

aux(x, z) <— P(x, y, t), R(y, z, t), y < 3, x 7^ null. 

aux(x, z) ^— P(x, y, t), R(y, z, t), y < 3, z ^ null. 

3. P(x,y, bu) <r- P(x,y,t),R(y,z,t),y <3,y^ null, 

aux(x, z), P(null, y, u), x ^ null. 
R(y, z, bu) 4- P(x, y, t), R(y, z, t), y < 3,y ^ null, 

aux(x, z), R(y, null, u), z ^ null. 
P(x, y, bu) <- P(x, y, t), R(y, z, t), y < 3,y ^ null, 

aux(x, z),P(x, null, u). 
R(y, z, bu) <r- P{x, y, t), R(y, z, t), y < 3, y ^ null, 

aux(x, z),R(null, z, u). 

4. P(x,y,t) P(x,y). P(x, y, t) <- P(x, y, u). 
R(x, y, t) <- R(x, y). R(x, y, t) R(x, y, u). 

5. P(x,y, s) <r- P(x,y,t), not P(x, y, bu). 
R(x,y,s) <— R(x,y,t), not R(x, y, bu). 

The facts in 1 . belong to the initial instance D, and become 
annotated right away with t by rules 4. The most important 
rules of the program are those in 2. and 3. They enforce 
the update semantics of secrecy in the presence of null and 
using null. Rules in 2. capture in the body the violation of 
secrecy (i.e. a non-null view contents); and in the head, the 
intended way of restoring secrecy: We can either update a 
combination of (combination) attributes or single secrecy 
attributes with null. In this example, we need to update, 
with null, values in attribute B or in attributes A and C, 
simultaneously. 

Since disjunctive programs do not allow conjunctions in 
the head, the intended head (P(null, z) A P(y, null)) V 
P(x, null) V Q(null, z) Body is represented by 

means of two rules, as in 2.: P(null,z) V P(x,null) V 
Q(null,z) Body and P(y,null) V P(x,null) V 

Q(null,z) Body. 

Furthermore, we need to restore secrecy only if the given 
database is not already a secrecy instance, which happens 
when the combination attribute B is not null, the secrecy 
attributes A and C are not null, and formula if is true. 
Predicate aux(x, z) defined in 2. captures the condition 
not (x null A z ^ null). 

The rules in 3. collect the tuples in the database that have 
already been updated and (virtually) no longer exist in the 
database. Rules 4. annotate the original the atoms and also 
the new version of updated atoms. Rules in 5. collect the 
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tuples that stay in the final state of the updated database: 
They are original or new, but have never been updated. ■ 

The secrecy instances are in one-to-one correspondence 
with the restrictions to s-annotated atoms of the stable 
models of n(D,V s )0 

Example 17. (example [16] continued) The program has 
three stable models (the facts in 1. are omitted): 

Mi = {P(l,2,t), i?(2,l,t), cwj(1,1), P(1,2,s) , 

P(2, 1, bu), R(null, 1, u), R(null, 1, t), R(null, 1, s) }. 

M 2 = {P(l,2,t), P(2,l,t), o«a;(l,l),P(l,2,bu), 

R(2, l,s) , P(l, null, u), P(l, n««, t), P(l, nu/2, s)}. 

M 3 = {P(l,2,t), P(2,l,t), o«a;(l,l),P(l,2,bu), 

P(2, 1, bu), P(null, 2, u), R(2, null, w),P(null, 2, t), 
R(2, null, t), <rax(l, null), aux(null, 1), P{null, 2, s), 
R(2,null,s)}. 

The secrecy instances are built by selecting the underlined 
atoms, obtaining: D x = {P(l,2), R(null, 1)}, D 2 = 
{P(l, null), R(2, 1)}, and D 3 = {P(null, 2), R{2, null)}. 
They coincide with those in Example [10] ■ 

In order to compute secret answers to a query, it is 
not necessary to explicitly compute all the stable models. 
Instead, the query can be posed directly on top of the 
program and answered according to the skeptical semantics. 
This will return the secret answers to the query. The 
query has to be formulated as a top-layer program, with 
s-annotated atoms, that are those that affect the query. A 
system like DLV can be used. It computes the disjunctive 
stable-model semantics, with an interface to commercial 
DBMSs l22l . 

Example 18. (example [T7] continued) We want the secret 
answers to the conjunctive query 

Q{x, z) : 3y(P(x, y) A R(y, z) A y < 3). 

This requires first rewriting it, as in (|6), into Q rw {x,y) : 
3y(P(x, y) A R(y, z)Ay<3Ay^ null). This new query 
can be evaluated against instances with null treated as any 
other constant. In its turn, Q™ is transformed into a query 
program with all the database atoms using annotation s: 

Ans(x, z) «— P(x, y, s),R(y, z,s), y < 3, y ^ null. 

This one is evaluated in combination with the secrecy 
program in Example Q~6] under the skeptical semantics. In 
this evaluation, null is treated as an ordinary constant. ■ 

A. The general secrecy logic program 

To provide the general form of secrecy logic program, we 
need to introduce some notation first. We recall that our 
view definitions are of the form 

V s {x) ^ Ri(xi),...,R n (x n ), ^. (8) 

12 The proof of this claim is rather long, and is similar in spirit to the 
proof of the fact that database repairs wrt integrity constraints (3) can 
be specified by means of disjunctive logic programs with stable model 
semantics (cf. 1101 . (2])- 



Some of the variable^lin atoms in the body of the def- 
initions are relevant, as in Definition [8] and their values 
will be replaced by null. As expected, and illustrated in 
Example [10] those atoms and variables play a crucial role 
in the program. 

For an atom of the form R(x) and variables y C x, 
R{x)-r^Tt denotes R(x) with all the variables in y replaced 
by null. In reference to ©, with this notation, we define: 

CV(V S ) = {Ri(xi)^- | Ri(xi) is in body of ®, 
null 

V = {yi,-,Vn} C x, and y t € C(V S ))}. 

ST(V S ) = {Riixi)^- | Riixi) is in body of ®, 
null 

y = {yi,-,y n }^x, and y t € S{V S ))}. 

For the sets of predicate positions, C(V S ) and S(V S ), see 
Definition [8] The atom sets CV(V S ) and SV(V S ) will be 
used in the head of the disjunctive rules that change some 
relevant attribute values into nulls (rules 2. in Example [Tot. 

Example 19. For the secrecy view V s (x, z,w) ■(— 
P(x,y),Q(y,z,w), it holds: C(V S ) = {P[2],Q[1]} 
and S(V S ) = {P[1],Q[2],Q[3]}. Thus, CT(V S ) = 
{P (x, null), Q (null, z,w)}, and SV(V S ) = {P(null,y), 
Q(y, null, null)}. M 

Given a database instance D, a set V s of secrecy views 
V s s, each of them of the form ©, the secrecy program 
II(Z), V s ) contains the following rules: 

1. Facts: R(c, t) for each atom R(c) G D. 

2. For every V s of the form ©, if SV(V S ) = {R 1 ^), 
...,R a (x a )}, and CT(V S ) = {R 1 ^), ...,R b (x b )}, then 
the program contains the rules: 

(a) If S(V S ) n C(V S ) + 0, the rule: 

V R c {x c ,u) /\" =1 Ri(xi,t), ip, A vi^null. 

rc£CV(v s ) v^cm) 

(b) lfS(V s )nC(V s ) = 0, for each R d e SV(V S ), 1 < d < a, 
the rule: 

R d {x. d ,u)\/ V R c {xc,u) <- A?=i-Ri(Si,t), <p, 

R-=£CV{V S ) 

f\ vi 7^ null, auxv s {x). 

viec{v s ) 

Plus rules defining the auxiliary predicates: If S(Vs) — 
{x 1 , x k } and x = (x 1 , . . . ,x k ), then for each 1 ^ i ^ k, 
the rule 

auxy s (x) A™=i P«(^-t) t) A tp A x 1 7^ null. 

3. The old tuple collecting rules: 

(a) For each R> G SV{V S ), 1 < j < a: 

R j (xj, hu) 4- A" = i Ri(xi,t), ip, aux Vs (x), 

A vi 7^ null, R J (xj,u), A v i null. 

(b) For each R c e CP(V S ), 1 < c < b: 
R c (x c ,bu) <- l\ n i=1 Ri{xi,t), tp, aux Vg (x), 

A vi 7^ null, R c (x c ,u). 

13 To be more precise, we should talk about variables in relevant 
positions or arguments, as we did before, e.g. in Section Mill but the 
description would be less intuitive. 
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4. For each R e TZ, the rale: R(x, t) «- R(x, u). 

5. For each R e TZ, the rale: 

R(x,s) <— R(x,t), not R(x, bu). 

Rules in 1. create program facts from the initial instance. 
Rules in 2. are the most important and express how to 
impose secrecy by changing attribute values into nulls. 
Notice that, by definition, CP(V S ) and SV(V S ) already 
already include those changes. The body of the rule be- 
comes true when the database instance does not nullify 
the view, and the head captures the intended ways of 
imposing secrecy. Rules in 3. collect the tuples in the 
database that have already been updated and (virtually) no 
longer exist in the database. Rules 4. capture the atoms that 
are part of the database or updated atoms in the process 
of imposing secrecy. Rules in 5. collect the tuples in the 
secrecy instance, as those that did not become old. 

The same secrecy program can be used with different 
queries. However, available optimization techniques can be 
used to specialize the program for a given query (cf. ifPTl . 
(5) for this kind of optimizations for repair logic programs). 

VI. The CQA Connection 

Consider a database instance D that fails to satisfy a 
given set of integrity constraints IC. It still contains 
useful and some semantically correct information. The 
area of consistent query answering (CQA) [3), has to 
do with: (a) Characterizing the information in D that is 
still semantically correct wrt IC, and (b) Characterizing, 
and computing, in particular, the semantically correct, i.e. 
consistent, answers to a query Q from D wrt IC. The first 
goal is achieved by proposing a repair semantics, i.e. a class 
of alternative instances to D that are consistent wrt IC and 
minimally depart from D. The consistent information in D 
is the one that is invariant under all the repairs in the class. 
This applies in particular to the consistent answers: They 
should hold in every minimally repaired instance. 

There are some connections between CQA and our 
treatment of privacy preserving query answering. Notice 
that every view definition of the form (fJJ can be seen 
as an integrity constraint expressed in the FO language 
L(£U{V S }): 

Vx(V s (x) <— ► 3y(i?i(xi) A • • • A R n (x„) A tp)), (9) 

with y = ([Jxi) \ x. From this perspective, the problem 
of view maintenance, i.e. of maintaining the view defined 
by (O synchronized with the base relations ifTTl becomes 
a problem of database maintenance, i.e. maintenance of 
the consistency of the database wrt (O seen as an IC. This 
also works in the other direction since every IC can be 
associated to a violation view, which has to stay empty for 
the IC to stay satisfied. 

Actually, we want more than maintaining the view de- 
fined in ©. We want it to be empty or returning only tuples 
with null values. In consequence, we have to impose the 
following ICs on D, which are obtained from the RHS of 



©: If x is x 1 , ... , x k , then for 1 < i < k, 

Vxy-.(i?i(xi) A • ■ ■ A R n (x n ) AipAx 1 ^ null). (10) 

That is, from each view definition (O we obtain k denial 
constraints (DCs), i.e. prohibited conjunctions of (positive) 
database atoms and built-ins. DCs have been investigated 
in CQA under several repair semantics lH4l . 0. 

In our case, the secrecy instances correspond to the 
repairs of D wrt the set DCs in ([Toi l. These repairs are 
defined according to the null-based (and attribute-based 
0) repair semantics of Section|Ill] i.e. <r> -minimality (cf. 
Example ITOb . Through this correspondence we can benefit 
from concepts and techniques developed for CQA. 

Example 20. The secrecy view defined by 

V s (x,z) ^ P(x,y),R{y,z),y <3 
gives rise to the following denial constraints: 
^3xyz(P(x,y) A R(y,z) Ay < 3Ai ^ null) and 
-Bxyz(P(x, y) A R(y, z)Ay<3Az^ null). A instance 
D has to be minimally repaired in order to satisfy them. ■ 

VII. Related Work 

Other researchers have investigated the problem of data 
privacy and access control in relational databases. We 
described in Section Q] the approach based on authorization 
views 1271 . 11331 . In 1191 . the privacy is specified through 
values in cells within tables that can be accessed by 
a user. To answer a query Q without violating privacy, 
they propose the table and query semantics models, which 
generate masked versions of the tables by replacing all the 
cells that are not allowed to be accessed with NULL. When 
the user issues Q, the latter is posed to the masked versions 
of the tables, and answered as usual. The table semantics 
is independent of any queries, and views. However, the 
query semantics takes queries into account. JT9] shows the 
implementation of two models based on query rewriting. 

Recent work l30l has presented a labeling approach for 
masking unauthorized information by using two types of 
special variables. They propose a secure and sound query 
evaluation algorithm in the case of cell-level disclosure 
policies, which determine for each cell whether the cell 
is allowed to be accessed or not. The algorithm is based on 
query modification, into one that returns less information 
than the original one. Those approaches propose query 
rewiring to enforce fine-grained access control in databases. 
Their approach is mainly algorithmic. 

Data privacy and access control in incomplete proposi- 
tional databases has been studied in J6), Q, 11311 . They take 
a different approach, control query evaluation (CQE), to 
fine-grained access control. It is policy-driven, and aims to 
ensure confidentiality on the basis of a logical framework. 
A security policy specifies the facts that a certain user is 
not allowed to access. Each query posed to the database 
by that user is checked, as to whether the answers to it 
would allow the user to infer any sensitive information. If 
that is the case, the answer is distorted by either lying or 
refusal or combined lying and refusal. In (8|, they extend 
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CQE to restricted incomplete FO logic databases via a 
transformation into a propositional language. This approach 
seem to be incomparable to ours. They do not use null 
values, and the issue of maximality of answers that do not 
compromise privacy is not explicitly addressed. 

Our approach is based on producing virtual updates on 
the database, by forcing the secrecy views to become null. 
This is clearly reminiscent of the older, but still challenging 
database problem of updating a database through views 
fl"3l . Here we confront new difficulties, namely the oc- 
currence of SQL nulls with a special semantics, and the 
minimality of null-based changes on the base relations. 

In J9j a null-based repair semantics was introduced, but 
it differs from the one introduced in Section |llT] The former 
was proposed for enforcing satisfaction of sets of ICs that 
include referential ICs, which require the possible insertion 
of new tuples with nulls. The comparison between instances 
is based onsets of full tuples and also on the occurrence 
of nulls in them. Here, we enforce secrecy by changes of 
attributes values only. 

A representation of null values in logic programs with 
stable model semantics is proposed in l28l . whose aim is to 
capture the intended semantics of null values a la Reiter, i.e. 
as found in his logical reconstruction of relational databases 
l26l . Two remarks have to be made here. First, Reiter 
reconstructs "logical" nulls, but not SQL nulls. In our work 
we use the latter, as done in database practice. Second, we 
take care of nulls by proposing a new query answering 
semantics that can be captured in classic logical terms via 
query rewriting. The rewritten queries are the input to a 
logic program, which then treats them as ordinary constants 
(without having to give a logical account of them). 

VIII. Conclusions 

In this work, we have developed a logical framework and a 
methodology for answering conjunctive queries that do not 
reveal secret information as specified by secrecy views. Our 
work is of a foundational nature, and attempts to provide a 
theoretical basis, or at least part of that basis, for possible 
technological developments. Implementation efforts and ex- 
periments, beyond the proof-of-concept examples we have 
run with DLV, are left for future work. 

We have concentrated on conjunctive secrecy views and 
conjunctive queries. We have assumed that the databases 
may contain nulls, and also nulls are used to protect secret 
information, by virtually updating with nulls some of the 
attribute values. In each of the resulting alternative virtual 
instances, the secrecy views either become empty or contain 
a tuple showing only null values. The queries can be posed 
against any of these virtual instances or cautiously against 
all of them, simultaneously. The latter guarantees privacy. 

The update semantics enforces (or captures) two natural 
requirements. That the updates are based on null values, and 
that the updated instances stay close to the given instance. 
In this way, the query answers become implicitly maximally 
informative, while not revealing the original contents of the 
secrecy views. 



The null values are treated as in the SQL standard, 
which in our case, and for conjunctive query answering, 
is reconstructed in classical logic. This reconstruction cap- 
tures well the "semantics" of SQL nulls (which in not 
clear or complete in the standard), at least for the case of 
conjunctive query answering, and some extensions thereof. 
This is the main reason for concentrating on conjunctive 
queries and views. In this case, queries and views can 
be syntactically transformed into conjunctive queries and 
views for which the evaluation or verification can be done 
by treating nulls as any other constant. 

The secret answers are based on a skeptical semantics. 
In principle, we could consider instead the more relaxed 
possible or brave semantics: an answer would be returned if 
it holds in some of the secrecy instances. The possibly secret 
answers would provide more information about the original 
database than the (certainly) secret answers. However, they 
are not suitable for our the privacy problem. 

Example 21. (example [10] continued) A possibly secret 
answer to the query Qi(x, y) : P(x,y) is (1,2), obtained 
from D3. Similarly, (2,1) is a possibly secret answer to 
0,-xix, y) : R(x,y). From these possibly secret answers, 
the user can obtain the contents of the secrecy view. ■ 

We introduced disjunctive logic programs with stable 
model semantics to specify the secrecy instances. This 
is a single program that can be used to compute secret 
answers to any conjunctive query. This provides a general 
mechanism, but may not be the most efficient way to go for 
some classes of secrecy views and queries. Ad hoc methods 
could be proposed for them, as has been the case in CQA 
0, 0. 

Our work leaves several open problems, and they are 
matter of ongoing and future research. Complexity issues 
have to be explored. For example, of deciding whether or 
not a particular instance is a secrecy instance of an original 
instance. Also, of deciding if a tuple is a secret answer to a 
query. The connection with CQA, where similar problems 
have been investigated, looks very promising in this regard. 

Another problem is about query rewriting, i.e. about the 
possibility of rewriting the original query into a new FO 
query, in such a way that the new query, when answered 
by the given instance, returns the secret answers. From the 
connection with CQA we can predict that this approach 
has limited applicability, but whenever possible, it should 
be used, for its simplicity and lower complexity. 

For future work, it would be interesting to investigate the 
connections with view determinacy |25l , that has to do with 
the possible determination of extensions of query answers 
by a set of views with a fixed contents. The occurrence of 
SQL nulls and their semantics introduces a completely new 
dimension into this problem. 

A natural extension of this work would go in the di- 
rection of freeing ourselves from the assumptions listed 
at the end of Section [IV] Their relaxation would create 
a challenging new scenario, and most likely, would require 
a non-straightforward modification of our approach. One 
of these possible relaxations consists in the addition of ICs 
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to the schema. If they are known to the user, and, most 
importantly, that they are satisfied by the database, then 
privacy could be compromised. Also the updates leading 
to the virtual updates should take these ICs into account, 
to produce consistent secrecy instances. 

It would also be interesting to investigate more expressive 
queries and secrecy views, going beyond the conjunctive 
case. However, if we allow negation, the challenges become 
intrinsically more difficult. On one side, in the case of se- 
crecy views, negation becomes a fundamental complication 
for privacy 11271 . Il33l . On the other, the query rewriting 
methodology that captures nulls as ordinary constants (cf. 
Section IH-BI) that we have used in our work does not 
include the combination of nulls and negation. The exten- 
sion of our privacy approach to queries or secrecy views 
with negation would make it necessary to first attempt an 
extension of this kind of query rewriting. However, this 
requires to agree on a sensible semantics for SQL nulls in 
the context of such more expressive queries, something that 
is definitely worth investigating. 
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